| DataSet | GRT low | GRT high | Distance Threshold | Proximity Criterion | Deers | Observations |
|---|---|---|---|---|---|---|
| 1 | 0 | 36 | 10 | closest in time | 35 | 149 |
| 2 | 0 | 36 | 10 | nearest | 35 | 147 |
| 3 | 0 | 200 | 15 | score | 36 | 223 |
P15.2 Fortgeschrittenes Praxisprojekt
Dr. Nicolas Ferry - Bavarian National Forest Park / Daniel Schlichting - StabLab
31 Jan 2025
Model FCM levels - amongst other covariates - on spatial and temporal distance to hunting activities
Expectations:
Contains information of 809 faecal samples, including:
Samples where taken at irregular time intervals from 2020 to 2022.
Deer location at the time of hunting event is approximated by linear interpolation:
A hunting event is considered relevant to a faecal sample, if
Among the relevant hunting events, the most relevant one is defined by one the three proximity criteria:
we define the Scoring function as following:
\[ S(d, t) \propto \begin{cases} \frac{1}{d^2} \cdot f_\textbf{t}(t), t \sim \mathcal{N}(\mu, \sigma^2) &|t \leq \mu \\ \frac{1}{d^2} \cdot f_\textbf{t}(t), t \sim \mathcal{Laplace}(\mu, b) &|t > \mu \end{cases} \] where:
\[ \begin{align*} d & \text{: Distance } \\ t & \text{: Time Difference } \\ \mu & \text{: GRT target = 19 hours } \end{align*} \]
The marginal effects of distance and elapsed time since challenge on the score:
We suggest three different Datasets for Modelling
| DataSet | GRT low | GRT high | Distance Threshold | Proximity Criterion | Deers | Observations |
|---|---|---|---|---|---|---|
| 1 | 0 | 36 | 10 | closest in time | 35 | 149 |
| 2 | 0 | 36 | 10 | nearest | 35 | 147 |
| 3 | 0 | 200 | 15 | score | 36 | 223 |
For Modelling, we consider the following covariates, defined for each pair of FCM sample and most relevant hunting event:
We chose two different approaches to Modelling:
We do this seperately for all 3 datasets (nearest, closest and score).
| Model | Objective | Evaluation Metric | Max Depth | Eta | Gamma | Subsample | Colsample Bytree | Min Child Weight | Mean RMSE | SD RMSE | Number of Observations |
|---|---|---|---|---|---|---|---|---|---|---|---|
| last | reg:squarederror | rmse | 4 | 0.1635 | 5.850 | 0.5918 | 0.9921 | 4.640 | 168.6336 | 24.40957 | 149 |
| nearest | reg:squarederror | rmse | 4 | 0.1661 | 5.893 | 0.5956 | 0.9832 | 4.747 | 151.3186 | 17.91780 | 147 |
| score | reg:squarederror | rmse | 5 | 0.1744 | 5.834 | 0.6063 | 1.0000 | 4.766 | 147.9845 | 16.50250 | 223 |
Family: Gamma
Log link for interpretability
Let \(i = 1,\dots,N\) be the indices of deer and \(j = 1,\dots,n_i\) be the indices of faecal samples for each deer
\[ \begin{eqnarray} \textup{FCM}_{ij} &\overset{\mathrm{iid}}{\sim}& \mathcal{Ga}\left( \nu, \frac{\nu}{\mu_{ij}} \right) \quad\text{for}\; j = 1,\dots,n_i, \\ \mu_{ij} &=& \mathbb{E}(\textup{FCM}_{ij}) = \exp(\eta_{ij}), \\ \eta_{ij} &=& \beta_0 + \\ && \beta_1 \cdot \textup{number of other relevant hunting events}_{ij} + \\ && f_1(\textup{time difference}_{ij}) + f_2(\textup{distance}_{ij}) + \\ && f_3(\textup{sample delay}_{ij}) + f_4(\textup{defecation day}_{ij}) + \\ && \gamma_{i}, \\ \gamma_i &\overset{\mathrm{iid}}{\sim}& \mathcal{N}(0, \sigma_\gamma^2) \quad\text{for} i = 1,\dots,N \end{eqnarray} \]
\(f_1, f_2, f_3, f_4\) are penalised cubic regression splines.
| Dataset | Term | Estimate | Std_Error |
|---|---|---|---|
| Closest in Time | (Intercept) | 5.824 | 0.053 |
| Closest in Time | NumOtherHunts | -0.137 | 0.061 |
| Dataset | Term | Estimate | Std_Error |
|---|---|---|---|
| Nearest | (Intercept) | 5.812 | 0.054 |
| Nearest | NumOtherHunts | -0.103 | 0.060 |
| Dataset | Term | Estimate | Std_Error |
|---|---|---|---|
| Highest Score | (Intercept) | 5.905 | 0.081 |
| Highest Score | NumOtherHunts | -0.016 | 0.014 |
| Category | Subcategory | Description |
|---|---|---|
| Diagnostics | QQ Plot | Residuals mostly follow expected distribution |
| Diagnostics | Residuals vs Predictor | No major pattern |
| Diagnostics | Histogram | Reasonable fit, some variance |
| Diagnostics | Observed vs Fitted | Moderate spread, some unexplained variance |
| Random Effects | Time & Space Effects | Weak or inconsistent |
| Random Effects | Sample Delay | Shows some effect |
| Linear Effects | other hunting events | No significant impact |
How to minimize spatial and temporal distance at the same time?
How to use a bigger Part of the Data?
Effect of Hunting on Red Deer